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Abstract 

The ability to reveal the regulatory architecture of genes at the whole-genome level by constructing a regulatory net- 
work is critical for understanding the biological processes and developmental programmes of organisms. Here, we 
conducted an eQTL-guided function-related co-expression analysis to identify the putative regulators and construct 
gene regulatory network. We performed an eQTL analysis of 210 recombinant inbred lines (RILs) derived from a cross 
between two indica rice lines, Zhenshan 97 and Minghui 63, the parents of an elite hybrid, using data obtained by 
hybridizing RNA samples of flag leaves at the heading stage with Affymetrix whole-genome arrays. Making use of 
an ultrahigh-density single-nucleotide polymorphism bin map constructed by population sequencing, 13 647 eQTLs 
for 10 725 e-traits were detected, comprising 5079 c/s-eQTLs (37.2%) and 8568 frans-eQTLs (62.8%). The analysis 
revealed 138 frans-eQTLs hotspots, each of which apparently regulates the expression variations of many genes. 
Co-expression analysis of functionally related genes within the framework of regulator-target relationships outlined 
by the eQTLs led to the identification of putative regulators in the system. The usefulness of the strategy was demon- 
strated with the genes known to be involved in flowering. We also applied this strategy to the analysis of QTLs for yield 
traits, which also suggested likely candidate genes. eQTL-guided co-expression analysis may provide a promising 
solution for outlining a framework for the complex regulatory network of an organism. 
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Introduction 

Expression of genes is co-ordinately regulated in space and 
time in an organism to deliver the gene products when needed, 
according to developmental programmes and in response to 
environmental cues (Schmid et al, 2005; Wang et al, 2010b). 
The ability to reveal the regulatory architecture of genes at 
the whole-genome scale by constructing a regulatory network 
is critical for understanding the biological processes and 
genome function of an organism (Davidson et al, 2002; Bar- 
Joseph et al, 2003; Shinozaki et al, 2003). 



In the past, technical approaches have been developed in two 
major ways to characterize gene expression regulation. The 
first is the detection technology of gene expression, ranging 
from traditional analysis of one gene at a time using Northern 
blotting, reverse transcription-PCR and real-time quantita- 
tive PCR (qPCR), to large-scale analysis of many genes, such 
as microarrays composed of all the annotated genes in an 
entire genome (Schena et al, 1995; Tan et al, 2003), and more 
recently by RNA sequencing (Wang et al , 2009). 



Abbreviations: eQTL, expression quantitative trait locus; iGA, iterative group analysis; LOD, logarithm of the odds; NIL, or nearly isogenic line; PC, probability of 
change; qPCR, real-time quantitative PCR; pQTL, phenotypic quantitative trait locus; RIL, recombinant inbred line; SI, support interval; SNP, single-nucleotide 
polymorphisms; SR, standardized residue. 
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In parallel, there has also been the development of 
approaches to determine regulatory relationships of the genes. 
It is a common practice in the literature to identify the down- 
stream targets of an altered gene using mutants or nearly iso- 
genic lines (NILs) to analyse expression profiles for genes of 
interest, and then to expand the regulatory pathways stepwise 
using more mutants or NILs (Doi et al. , 2004; Espinosa-Soto 
et al, 2004; Xue et al, 2008). Another approach involves 
co-expression analysis by calculating correlations between 
genes based on large volumes of expression data, which can 
also provide useful information on how likely the gene func- 
tions are to be related (van Noort et al, 2003; Zhang and 
Horvath, 2005; Wang et al, 2010b). Expression quantitative 
trait loci (eQTLs) are genomic regions that regulate the gene 
expression variations in a population. Analysis of eQTLs 
that treats the expression values of the genes as a quantitative 
trait (e-trait) to perform QTL analysis using a genetically seg- 
regating population appears to be a useful strategy for sug- 
gesting regulatory relationships between genes (Gilad et al. , 
2008; Kliebenstein, 2009). eQTL analysis can detect genetic 
elements that regulate the expression variation of the e-trait 
acting in cis if the QTL is located in the vicinity of an e-trait, 
or in trans if it is located in a distant position. Such analy- 
sis could suggest the existence of a potential regulator in a 
genomic region for a gene (Hansen et al, 2008; Kliebenstein, 
2009). Moreover, one regulator could be the target of another 
regulator. Thus, connecting such regulator-target relation- 
ships revealed by multiple e-traits and eQTLs of different 
layers would lead to a net-like structure, providing basic ele- 
ments for the construction of a regulatory network. More 
recently, an approach combining eQTL with co-expression 
analysis was proposed to identify regulator candidates under- 
lying eQTLs, which may greatly refine the resolution of the 
analysis (Terpstra et al, 2010; Flassig et al, 2013). 

We attempted to construct a regulatory network adopting 
an eQTL-guided co-expression analysis strategy using flag 
leaf, a tissue that contributes significantly to grain yield as 
a primary source of carbohydrate production, from a pop- 
ulation of 210 recombinant inbred lines (RILs) in rice. We 
showed that the analysis could identify candidate regulators 
and their targets down to the level of single genes. We verified 
the usefulness of this approach using genes in the flowering 
pathways. The results suggest that this analysis may provide a 
promising solution for outlining the framework for the com- 
plex regulatory network of an organism. 

Materials and methods 

Plant material 

The population used in the study contained 210 RILs derived by sin- 
gle-seed descent from a cross between two indica rice lines, Zhenshan 
97 and Minghui 63. The RILs and two parents were planted in a 
seed bed and transplanted to an experimental farm at Huazhong 
Agricultural University in the rice growing seasons in Wuhan, 
China. Ten seedlings were planted for each of the RILs in two dif- 
ferent batches at 45 d apart as two biological replicates. Flag leaves 
from three random plants of each RIL per replicate were collected 
at the heading stage (day of the panicle emergence; Supplementary 
Fig. SI available at JXB online) between 8:00 and 9:30 a.m. 



RNA extraction, isolation, and Agilent Bioanalyzer quality test- 
ing were conducted for each of the two biological replicates. High- 
quality RNA samples of the two replicates were mixed (1:1) for 
cDNA synthesis, labelling, microarray hybridization and scanning. 
All processes were conducted by CapitalBio Corporation (Beijing, 
China). 

NIL(mh7) and Zhenshan 97, which differ only by an introgressed 
genome segment, for the network verification were planted and the 
tissue samples collected in the same way as the RIL population. 

Microarray analysis 

R platform (http://www.R-project.org) and Affy' packages from 
bioconductor (http://www.bioconductor.org) were applied to man- 
age all the CEL files. Robust multiarray average (RMA) analysis 
was used for background correction and probe set expression value 
collection. Probe sets identified by MAS 5.0 algorithm as 'P' or 'M' 
for at least one-third of the 210 RILs were employed as e-traits for 
eQTL analysis. 

eQTL and phenotypic QTL (pQTL) analyses 

eQTL and classical pQTL analyses were based on a genetic map 
consisting of 1619 recombinant bins constructed using high-quality 
single-nucleotide polymorphisms (SNPs) from resequencing of the 
210 RILs (Yu et al., 2011). The R/qtl function dm (Broman et al, 
2003) was employed for eQTL and pQTL mapping using a 10 cM 
scan window and five marker covariates (Potokina et al. , 2008; Yu 
et al, 2011). The logarithm of the odds (LOD) threshold for eQTL 
identification was obtained based on a global permutation test that 
randomly selects 100 e-traits from all 21 929 e-traits to do 1000 per- 
mutations (West et al, 2007; Wang et al, 2010a). The eQTL addi- 
tive effect and variation explained by the eQTL were determined 
using the linear QTL model described by Yu et al (2011). With 
the ultrahigh-density SNP bin map, the LOD thresholds of pQTL 
analysis for 10 yield-associated traits at P=0.05 ranged from 4.76 to 
5.10 using 1000 permutation test for each phenotypic trait, with an 
average LOD value of 4.97. We used 5.0 as the threshold for pQTL 
identification. 



Real-time PCR analysis 

RNA was isolated from NIL(mh7) and Zhenshan 97 rice flag leaves 
at the heading stage using TRIzol reagent (Invitrogen). First-strand 
cDNA was synthesized from DNase I-treated total RNA using 
Superscript III reverse transcriptase (Invitrogen) according to the 
manufacturer's instructions. qPCR was performed with gene-spe- 
cific primers (Supplementary Table SI available at JXB online) in 
Applied Biosystems ViiA™ 7 using SYBR Premix Ex Taq (Takara) 
for six biological replicates of each sample. The program was 95 °C 
for 10 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 34 s, 
and relative expression levels were calculated by the 2~ AACT method 
(Livak and Schmittgen, 2001) compared with levels of expression of 
the ubiquitin gene. 

Accession codes 

The National Center for Biotechnology Information Gene 
Expression Omnibus microarray data have been submitted under 
accession number GSE49020. 



Results 

Distribution of e-traits 

In total, 44 029 (76.7%) of the 57 194 probe sets on the 
GeneChip Rice Genome Array each had a locus support in 
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the Rice Genome Pseudomolecules (Release version MSU 
6.1) representing 35 885 genes. In addition, 21 929 (49.8%) of 
the 44 029 probe sets corresponding to 17 934 genes were iden- 
tified as 'present' in more than one-third of 210 RILs using the 
'mas5calls' algorithm (Liu et al. , 2002) and were thus regarded 
as e-traits. We mapped the e-traits to the ultrahigh-density 
SNP bin map consisting of 1619 bins (Yu et al, 2011). The 
distribution of e-traits was highly uneven across the genome 
(Supplementary Fig. S2 available at JXB online), and their 
numbers in the 1619 bins deviated significantly from the expec- 
tation based on physical sizes of the bins as shown by a % 2 
test (x 2= 47 264, P=2.2E-16) using the chisq.test function for 
a goodness-of-fit test. Further evaluation using the standard- 
ized residue (SR), which follows an asymptotical normal dis- 
tribution, showed that the numbers of e-traits in 274 bins were 
significantly different from the expectations (Supplementary 
Table S2 available at JXB online). The density of the genes 
and e-traits in the centromeric and pericentromeric regions 
was lower than that in other chromosomal regions. 

Genome-wide eQTL mapping 

These e-traits were subjected to eQTL analysis based on the 
ultrahigh-density SNP bin map using composite interval map- 
ping (Zeng, 1993, 1994). We applied an LOD threshold of 4.95 
(P-0.05), which was determined by a global permutation test, to 
identify eQTLs. An eQTL was regarded as cw-acting if the gene 
(e-trait) was within the 1.5 LOD-drop support interval (SI) of 
the corresponding eQTL, or as /raws-acting if the gene (e-trait) 
was not located within the 1.5 LOD-drop SI (Keurentjes et al, 
2007). These processes eventually resulted in 13 647 eQTLs for 
10 725 e-traits (Fig. 1 and Supplementary Table S3 available at 
JXB online), including 5079 ds-eQTLs (37.2%) and 8568 trans- 
eQTLs (62.8%) (Table 1). Only one eQTL was detected for 8258 
(77.0%) of the 10 725 e-traits, and more than two eQTLs were 
detected for 407 (3.79%) of the e-traits. Most of the m-eQTLs 
had greater LOD values and effects on gene expression than 
frara-eQTLs (Fig. 1 and Supplementary Fig. S3 available at 
JXB online). Overall, 69.2% of the m-eQTLs each explained 
>20% of the expression variation for the corresponding e-traits, 
while only 8.79% of the trans-eQTLs could explain >20% of the 
expression variation for the corresponding e-traits. 

Trans-eQTL hotspots 

Four chromosomes (1, 3, 6, and 9) had higher numbers of 
eQTLs than expected based on both physical and genetic 
maps (Table 1), suggesting possible eQTL hotspots, as indi- 
cated by the horizontal bands (trans-QQTL hotspots) in Fig. 1. 
Comparison of observed and expected numbers of trans- 
eQTLs showed that trans-eQTLs detected in bins on some 
chromosomes (1, 3, 6, 7, and 9) were obviously more than the 
e-traits with cw-eQTLs (Supplementary Table S4 available at 
JXB online). We also performed a x 2 test for goodness of fit 
between observed and expected frazw-eQTLs followed by cal- 
culating the SR, and a hypergeometric test for observed trans- 
eQTLs and e-traits with cz'.v-eQTLs in the bins. It was found 
that the numbers of rra«.y-eQTLs in 138 bins were significantly 




Physical position of probe sets (Mb) 

Fig. 1. eQTLs identified using flag leaf tissue at the heading stage 
of RILs from a cross between Zhenshan 97 and Minghui 63. The 
positions of 1 3 647 eQTLs for 1 0 725 e-traits in the genome are 
shown. The x-axis shows the physical positions of expressed 
probe sets (e-traits) in the genome, and the y-axis the physical 
positions of eQTLs. Each dot represents an eQTL detected. The 
diagonal indicates possible c/s-eQTLs, which are co-located with 
their corresponding probe sets. All the off-diagnal points indicate 
frans-eQTLs. The 12 rice chromosomes are separated by grey 
lines. The color key reflects the LOD scores (LOD scores >30 
are set to 30). Most of the dots in higher scores are located in 
the diagonal representing c/s-eQTLs. Multiple horizontal bands 
(frans-eQTL hotspots) are shown on four chromosomes (1 , 3, 6, 
and 9), which suggests that transcript levels for many of the probe 
sets are associated with the polymorphisms in these regions. LOD 
scores >4.95 were adopted as the cut-off point for eQTLs. 

more than the expectations based on the sizes of the bins 
( J P<0.01) and the numbers of e-traits with cw-eQTLs (7 > <0.01) 
(Supplementary Table S4 available at JXB online), indicating 
both /ram-eQTL hotspots and coldspots. In particular, the top 
five bins of the trans-eQTL hotspots (Bin851 on chromosome 
6; Bin857 on chromosome 6, Bin37 on chromosome 1, Bin 394 
on chromosome 3, and Binl012 on chromosome 7) ranging 
from 20 to 50 kb harboured 439, 190, 169, 124, and 111 trans- 
eQTLs, respectively. 

Identifying candidate master regulators 

A regulator at the high-node level in the network, or master 
regulator, frequently regulates many genes involved in mul- 
tiple biological processes. We thus reasoned that: (1) such a 
regulator should correspond to a trans-eQTL hotspot, and 
(2) it should be co-expressed (positively or negatively) with 
the genes (targets) it regulates. To validate such an asser- 
tion and find candidate master regulators, we performed a 
co-expression analysis by calculating correlations between 
e-traits with cw-eQTLs and e-traits whose rra/M-eQTLs are 
located in the 1.5 LOD-drop SI of the cw-eQTLs (Fig. 2A). 



1072 | Wang etal. 

Table 1 . Statistics of the eQTLs on each chromosome 



Chr. 


eQTLs on chr. 


C/'s-eQTL 


Trans-eOJL 


Chr. size (Mb) 


Expected" 


SR b 


Chr. size (cM) 


Expected 0 


SR" 


cM/Mb 


1 


2563 


943 


1620 


43.3 


1588 


24.5 


201 


1684 


21.4 


4.6 


2 


929 


463 


466 


35.9 


1316 


-10.7 


175 


1473 


-14.2 


4.9 


3 


1855 


579 


1276 


36.4 


1335 


14.3 


188 


1574 


7.1 


5.2 


4 


706 


379 


327 


35.3 


1294 


-16.4 


127 


1068 


-11.1 


3.6 


5 


608 


373 


235 


29.9 


1096 


-14.7 


116 


974 


-11.7 


3.9 


6 


2453 


410 


2043 


31.2 


1144 


38.7 


144 


1212 


35.6 


4.6 


7 


1135 


290 


845 


29.7 


1089 


1.4 


135 


1137 


-0.1 


4.6 


8 


610 


326 


284 


28.4 


1041 


-13.4 


120 


1010 


-12.6 


4.2 


9 


1093 


356 


737 


23 


843 


8.6 


107 


900 


6.4 


4.7 


10 


509 


332 


177 


23.1 


847 


-11.6 


85 


716 


-7.1 


3.7 


11 


739 


380 


359 


28.5 


1045 


-9.5 


117 


980 


-7.7 


4.1 


12 


447 


248 


199 


27.5 


1008 


-17.7 


109 


918 


-15.5 


4.0 




13647 


5079 


8568 


372.2 


13647 




1625 


13647 




4.4 



a Expected number of eQTLs based on chromosome size from physical map. x 2 = 3690.2 (P < 2.2e-1 6) for the test of goodness of fit between 
the observed and expected numbers of eQTLs on the 1 2 chromosomes. 

b SR: for physical map, standardized residue [=(observed - expected)A/expected], which follows a normal distribution asymptotically. 

An absolute SR value >2.33 indicates statistical significance at P<0.01 . A positive value indicates that the observed number is greater than 

expected. 

c Expected number of eQTLs based on chromosome size from genetic map. % 2 = 2801 .1 8 (P < 2.2e-1 6). 
d SR: standardized residue for genetic map. 



The threshold for significant correlations (0.60 and -0.52, 
P<0.01) was appraised based on 1 000 000 pair-wise corre- 
lations from two randomly selected groups each containing 
1000 e-traits. Six genes seemed to be putative master regula- 
tors, as each was associated with more than 100 genes as the 
targets (Table 2), and other 27 genes obtained 20-100 candi- 
date targets (Supplementary Table S5 available at JXB online). 
A gene ontology analysis based on biological processes of the 
associated targets for the six genes revealed that the targets 
for two of the putative regulators (Os.l2583.1.Sl_at anno- 
tated as expressed protein and Os.53449.1.Al_at annotated 
as OMP85 family protein) seemed to be highly enriched in 
certain functional categories (Supplementary Table S6 avail- 
able at JXB online). No obvious enrichment was detected 
for the other four putative master regulators, indicating that 
these putative master regulators might be involved in diverse 
regulatory pathways. 

Network construction for functionally related genes 

Gene regulatory networks usually regulate the expression of 
arrays of genes in specific spatial and temporal patterns. To 
explore the feasibility of using the expression data to identify 
regulatory networks, we performed co-expression analysis of 
genes known to be involved in the long-day flowering path- 
way. Many studies have revealed that, as a short-day plant, 
rice flowering is regulated by both a short-day activation 
pathway and a long-day suppression pathway (Hayama et al. , 
2003). Our RIL populations were planted in the rice grow- 
ing season (May-September) in Wuhan, China, in which the 
natural day length is 13-14.5 h (nearly a long day). 

An iterative group analysis (iGA) algorithm (Breitling 
et al. , 2004; Keurentjes et al. , 2007) was used to analyse the 
e-trait expression profile to find master regulators of the 
flowering genes, based on the assumption that a concerted 



expression variation was controlled by the same regulator 
and that the functions of the genes were physiologically rel- 
evant. In this process (Fig. 2B, C), a putative local regula- 
tor gene that has a «'.v-eQTL co-located with a trans-eQTL 
was assigned a group name, and an e-trait with a trans-eQTL 
that co-mapped with the group was assigned as a member 
of the group. We first sorted the putative targets (members) 
according to the expression correlations with potential regu- 
lators (groups) in descending order, and moved along the list 
counting the targets of each candidate regulator. When a new 
target was encountered each time, we inquired: How likely is 
it that we would observe this many targets of the regulator 
that high up in the list by chance? For each group, we defined 
the probability of change (PC) value, which indicated how 
likely a regulator with cw-eQTL is to show a strong corre- 
lation with several members with trans-eQTLs of a selected 
group of genes. A regulatory group with a significant PC 
value (P<0.05) would be regarded as a regulator controlling 
the members contributing to this PC value. 

We illustrated this procedure using the flowering regulation 
genes as an example. We identified 577 unique loci annotated 
as known or predicted genes in the category GO:0009908 
(flower development), as well as ones known to be involved in 
flower development and the flowering time pathway from the 
literature (Supplementary Table S7 available at JXB online). 
There were 280 eQTLs associated with 177 of the 577 unique 
probe sets including 109 cis-eQTLs and 171 trans-eQTLs 
(Supplementary Table S8 available at JXB online). Sixty- 
one of the 109 genes with m-eQTLs had a SI overlapping 
the SI of trans-eQTLs. We calculated expression correlations 
between the 61 genes (potential regulators) and 171 genes 
with trans-eQTLs (potential targets), which amounted to 
7808 gene pairs when disregarding the genes that had both 
cis- and trans-eQTLs (Supplementary Table S9 available at 
JXB online). The correlation coefficients were rank ordered. 
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Fig. 2. Schematic diagram of the procedure for identification of master regulators and construction of regulatory network for functionally 
related genes. (A) Identification of master regulators in the trans-eQJL hotspots by eQTL mapping. (B) eQTL map of functionally 
related genes. (C) The process of seeking master regulators and their targets according to the expression correlation between putative 
regulators and candidate targets for constructing a regulatory network (see main text for details). 



iGA (Breitling et al. , 2004) was applied to select the most sig- 
nificant groups and the significant change targets contribut- 
ing to PC value, which indicate how likely a regulator with 
c/.y-eQTL is to observe a strong correlation with several mem- 
bers with trans-eQTLs of a selected group of genes. 

Hd3a and RFT1, essential for flowering in rice, share 91% 
identity in their deduced amino acid sequences and are located 
only 11.5kb apart on chromosome 6 (Komiya et al., 2008). 
A cw-eQTL was detected for RFT1, but the expression varia- 
tion of Hd3a mapped no significant locus. To avoid the obscur- 
ing effect of sequence identity between the genes, the expression 
levels of RFT1 and Hd3a were recalculated using a subset of 
four probes selected from the 1 1 probes in the chip for each 
gene that had more than five SNPs between RFT1 and Hd3a. 
With this analysis, the effect of the cw-eQTL for RFT1 was 



greatly elevated with the LOD increasing from 7.33 to 76.02 
(Supplementary Table S10 and Supplementary Fig. S4 availa- 
ble at JXB online), and the amount of variation explained from 
9.9 to 12.3%. A trans-eQTL for Hd3a (LOD=5.2, i? 2 =9.2%), 
associated with Bin 1006 (chromosome 7), emerged from this 
analysis (Supplementary Table S10 and Supplementary Fig. 
S5 available at JXB online). At the cis-eQTL for RFT1 on 
chromosome 6, the Minghui 63 allele had a positive effect on 
RFT1 expression. However, RFT1 also had a trans-eQTL on 
chromosome 1, at which the Zhenshan 97 allele had a positive 
effect on RFT1 expression. Consequently, the expression levels 
of RFT1 did not seem to be very different in the two parents 
(Supplementary Fig. S6 available at JXB online). In contrast, 
the two parents showed large difference in the expression level 
of Hd3a, apparently due to the trans-eQTL on chromosome 7. 
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Eight obvious regulatory groups were identified (,P<0.05) 
(Table 3) based on the iGA analysis, and the targets for 
each regulator are shown in Fig. 3, with the expression cor- 
relations displayed in Supplementary Table Sll available at 
JXB online. Ghd7, Hdl, and RFT1 were identified as master 
regulators in this analysis, which were the trans-eQTLs for 
several flowering-related genes. The fra/7.s-eQTL for Hd3a 
appeared to be Ghdl (cor=-0.30, _P=8.63E-6). Several trans- 
eQTLs for flowering-related genes seemed to be RFT1, of 
which OsMADSH and OsMADS15 were identified as tar- 
gets of RFT1 in a previous work (Komiya et al, 2008, 2009). 
Meanwhile, OsGF14b and OsGF14c, two 14-3-3 proteins, act 
as intracellular receptors for rice Hd3a florigen (Taoka et al. , 
2011), and were identified as the targets of RFT1 in our 
results. This is in agreement with the conclusion that RFT1 
was the major regulator for long-day flowering (Komiya 
et al, 2009). Ehdl and OsMADS50, confirmed as upstream 



regulators of RFTllHd3a, both had OYww-eQTLs co-located 
with Hdl on chromosome 6. Highly significant nega- 
tive expression correlations of Hdl with Ehdl (cor=-0.54, 
i>=4.06E-17) and OsMADS50 (cor=-0.27, P=7.51E-05) in 
210 RILs indicated that they both were downregulated by 
Hdl, suggesting Hdl as a regulator of Ehdl and OsMADS50, 
Others have reported that Hdl and Ehdl are independent 
in flowering regulation (Doi et al, 2004; Takahashi et al, 
2009), and a more recent study showed that Ehdl appears 
to be negatively regulated and Ghd7 was positively regulated 
by Hdl under long-day conditions (Song et al, 2012), which 
seems to substantiate our network. We obtained the 2kb 
genomic sequence upstream of the ATG site of Ehdl and 
OsMADS50, and found no difference between Zhenshan 
97 and Minghui 63, suggesting that this 2kb sequence may 
not be the cause of the expression difference. In contrast, 
OsMADS50 was positively regulated by Ghdl (cor=0.25, 



Table 2. Information on the six putative master regulators genes, associated with more than 100 genes as targets 

The candidate regulators were suggested to be the gene with c/s-eQTLs that co-mapped with frans-eQTLs of many co-expressed genes 
(P<0.01). 



Regulator 




Bin 


Chr. 


LOD" 


Inf. Mb c 


Sup. Mb" 


Var 6 


Annotation' 


Os.3539.1.S1_at 


412 


Bin851 


6 


5.61 


2.89 


4.50 


7.45% 


Vesicle-associated membrane 
protein, putative, expressed 


Os.1 2583.1. S1_at 


213 


Bin389 


3 


14.77 


8.66 


9.53 


22.27% 


Outer-membrane protein, 
OMP85 family, putative, 
expressed 


Os.53449.1.A1_at 


204 


Bin395 


3 


6.80 


8.75 


10.04 


1 1 .39% 


Expressed protein 


Os.9960.1.S1_at 


145 


Bin857 


6 


23.56 


3.29 


4.02 


35.33% 


WD-40 repeat family protein, 
putative, expressed 


Os.16410.1.S1_s_at 


127 


Bin853 


6 


4.96 


2.89 


4.50 


4.61 % 


BSD domain containing 
protein, expressed 


Os.27513.1.A1_a_at 


103 


Bin36 


1 


5.45 


4.75 


6.04 


5.17% 


Phosphoenolpyruvate car- 
boxylase, putative, expressed 



a N, the number of co-expression probe sets with frans-eQTLs located in the 1 .5 LOD-drop interval of corresponding c/s-eQTL of the e-trait. 

b LOD, the LOD value for the c/s-eQTL for the regulator. 

c lnf.Mb, the inferior position for the 1 .5 LOD-drop interval of c/s-eQTLs. 

d Sup.lv1b, the superior position for the 1 .5 LOD-drop interval of c/s-eQTLs. 

"Var, the expression variation explained by the c/s-eQTL for the regulator. 

'Annotation, the gene annotation for the regulator. 



Table 3. The eight regulatory groups identified in the analysis 



Group 3 


Members 6 


Number changed 0 


P value changed d 


Gene 


MUS6.1 


Os.1 5230.1. S1_at 


24 


19 


1.14E-05 


RFT1 


LOC_Os06g06300 


Os.1 0204.1. S1_at 


10 


7 


1 .08E-04 


Os01g09590 


LOC_Os01g09590 


Os.1 2795. 1.S1_at 


2 


2 


7.68E-03 


OsPRR59 


LOC_Os1 1 g05930 


OsAffx.28467.1.S1_at 


11 


11 


0.016 


Ghd7 


LOC_Os07g 15770 


Os.1 2674. 1.S1_s_at 


2 


2 


0.022 


Os03g11340 


LOC_Os03g11340 


Os.1 189.1. S1_at 


6 


6 


0.032 


Hd1 


LOC_Os06g 16370 


Os.30077.2.S1_at 


5 


3 


0.033 


Os03g 16210 


LOC_Os03g16210 


Os.2720.1 .S1_at 


3 


1 


0.042 


OsPEP 


LOC_Os10g41440 



a Group, All local regulatory genes were classified as group names. 

^Members, the number of e-traits with frans-eQTLs co-mapped with the corresponding group. 
c Number changed, the number of candidate targets that may be controlled by the regulator. 

d P value changed, the probability of change, indicating the likelihood of observing the members of the group that high up in the list by chance. 
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Fig. 3. The regulatory network of genes involved in flower development and the flowering time pathway. Genes in yellow are putative 
regulators, while other genes connected to them in black are the candidate targets. Arrows indicate positive regulation and bars indicate 
negative regulation. 



P=2.05E-4) with a trans-eQTL (LOD=14.1, i? 2 =20%) in the 
genomic position of Ghd7. 

To confirm the network, we checked the transcript abundance 
of the candidate downstream targets for modulator of Ghd7 
using flag leaves at heading date from Zhenshan 97 and NIL 
(mh7), which have an identical genetic background except for 
the introgressed segment (Xue et al, 2008). The qPCR results 
showed that the transcript levels of OsFTLlO, OsFTL12, and 
Hd3a were lower in NIL(mh7) than in Zhenshan 97, while the 
reverse was the case for OsMFTl, OsMADS50, OsMADS55, 
and OsMADS16 (Fig. 4). This was consistent with the result 
presented in the network analysis (Supplementary Table Sll 
available at JXB online). However, the situation for llg31540 
was different from the analysis of the population data (Fig. 3). 
Moreover, OsSPLU and OsFTLl, putative targets of Ghd7 
that were identified with very low correlations (0.19, ,P=0.007; 
0.24, P=4.8E-4), did not show significantly different gene 
expression between NIL(mh7) and Zhenshan 97. These dis- 
crepancies may be due to differences in the genetic material 
(RILs vs NILs) and/or environmental factors, as the cw-eQTL 
for Ghdl could not fully explain the expression variation of 
the gene in the population (LOD=67.6, 7? 2 =68.5%). 

Application of cis-eQTLs for gene discovery of 
phenotypic traits 

Genetic variations for gene expression and phenotypic quanti- 
tative traits are mostly the results of DNA sequence polymor- 
phisms. Combining pQTL mapping with eQTL analysis may 



facilitate investigation of the molecular regulatory networks 
underlying quantitative traits. A gene with cw-eQTLs usu- 
ally has functional polymorphisms in promoter and/or gene- 
coding regions, which may cause variations in gene expression 
levels and phenotypic effects. Indeed, phenotypic variations 
in a number of QTLs for phenotypic traits result from gene 
expression variation due to sequence polymorphisms (Chu 
et al, 2006; Weng et al, 2008; Li et al, 2011). Flag leaves at 
the heading stage, the tissue used in this analysis, have a cru- 
cial role for rice yield. We thus used data for yield and yield- 
related traits for QTL mapping to identify candidate genes 
controlling the variation of the traits. With the ultrahigh- 
density SNP bin map, the LOD thresholds at P=0.05 ranged 
from 4.76 to 5.10 for 1000 permutation test, with an average 
LOD value of 4.97, and we selected 5.0 as the threshold for 
pQTLs (Supplementary Table S12 available at JXB online). 
We then calculated correlations between phenotypic trait 
scores and gene expression levels of e-traits with c^-eQTLs 
with Sis overlapping the corresponding pQTLs. The genes 
with significant correlations with the trait scores are shown 
for each trait in Supplementary Table S13 (available at JXB 
online). Interestingly, a pleiotropic QTL controlling the num- 
ber of grains per panicle, plant height, heading date, yield per 
plant, and flag leaf width was shown to be associated with 
Binl006 on chromosome 7. Three genes (LOC_Os07gl6030, 
LOC_Os07gl5670, and LOC_Os07gl5770) with larger-effect 
cw-eQTLs were regarded as putative candidate genes, and 
exhibited significant correlations with the phenotypes of the 
four traits (Supplementary Table S14 available at JXB online). 
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Hd3a 
P=0.0015 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



OsMADS50 
P=1.03e-7 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



OsMADS55 
p=4.18e-7 




Ghd7 
NIL(Minghui 63) 



OsMADS16 
P=0.017 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



OsMIFl 
P=0.041 



OsFTLlO 
P=0.0013 



OsFTL12 
P=0.0017 



llg31540 
P=0.0043 




Ghd7 Zhenshan 97 

NIL(Minghui 63) 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



Ghd7 Zhenshan 97 

NIL(Minghui 63) 



Fig. 4. qPCR verification for the expression abundance of candidate targets in NIL(mh7) and Zhenshan 97. Flag leaves at the heading 
date were used for the analysis. The y-axis represents relative expression levels. Means ±standard error are based on six biological 
replicates. 



The cloned gene Ghd7 (LOC_Os07gl5770) has been shown to 
be a pleiotropic QTL for number of grains per panicle, plant 
height, heading date, and yield per plant (Xue et aL, 2008), 
and Zhenshan 97 lacked the Ghdl locus while Minghui 63 had 
a functional allele. This confirmed a cw-eQTL with a remark- 
able effect on the expression level of Ghd7 (LOD= 67.6, 
i? 2 =68.5%). 

Discussion 

Construction of a regulatory network is the basis for depict- 
ing regulation of gene expression at the whole-genome level, 
which is fundamental for an understanding of biological 
processes and the functions of both individual genes and the 
genome. A classical gene expression network is constructed 
based on one gene perturbation at a time, making use of 
genetic materials such as mutants and NILs. Although such 
analysis could provide reliable information on the immedi- 
ate downstream target(s) of the perturbed gene, it may not 
be able to provide useful information for regulators and/or 
targets that are in more distant positions or upstream of 
the perturbed gene in the network. eQTL mapping seeks to 
understand the genetic basis of gene expression variation and 



identify causal relationships between gene expression and 
genetic variation by analysing multifactorial natural pertur- 
bations (Jansen and Nap, 2001; Logsdon and Mezey, 2010). 
However, like pQTLs, an eQTL corresponds to a genomic 
region of variable size, usually ranging from hundreds of 
kilobases to megabases, which still requires huge efforts to 
identify possible candidate genes. 

In this study, we have presented a simple, yet effective 
method for gene regulatory network reconstruction using 
an eQTL-guided function-related co-expression analysis. We 
combined data for gene expression from a rice RIL popula- 
tion, gene functional annotation of prior knowledge and iGA 
with the co-expression level to identify candidate regulators. 
Such analysis is based on the hypothesis that a regulator in 
a high node of the network regulates many genes and would 
show ?raw.s-acting effects in the form of fra/M-eQTLs for many 
e-traits that show a correlated expression profile at the popu- 
lation level, and the targets and regulator would be involved 
in the same biological process. 

We used genes involved in the flowering regulation as an 
example to demonstrate the utility of the method in con- 
structing a gene regulatory network, and identified and vali- 
dated with high confidence a number of elements and the 



pathways known to be involved in flowering regulation. Our 
results demonstrated that the strategy adopted here could 
precisely identify a putative regulator down to the single gene, 
representing a great improvement compared with ordinary 
eQTL analysis, which detected the trans-eQTL as an inter- 
val harbouring many genes. Compared with co-regulation 
analysis by simple correlations, the strategy here was able 
to provide a causal relationship between the co-expressed 
genes, which also represents a major improvement. In addi- 
tion, the analysis revealed other genes in the network that 
have not been identified previously as elements in flowering 
regulation. Whether they really represent genes in the sys- 
tem that are yet to be discovered remains to be confirmed, 
but these may be used as clues for future experiments. 

We further expanded such eQTL-guided co-expression 
analysis to the identification of candidate genes for pQTLs. 
The results also demonstrated the promise of this approach, 
which may lead to the identification of candidate genes con- 
trolling quantitative traits. 

Another point to be discussed is that the tissue used in 
this study was flag leaf at the heading stage, at which time 
the plant is already flowering, well after flowering induction. 
This raises a question about the functions of these genes 
and the regulatory network, suggesting that they may also 
have functions in other biological processes besides flower- 
ing regulation. In fact, it has been shown that flowering- 
related genes often exhibit pleiotropic effects that cannot 
be explained by the extended length of the vegetative stage 
alone (Yan et al, 2011, 2013). For example, Ghd7 affects 
grain number, plant height, and heading date under long-day 
conditions (Xue et al, 2008), and was also reported as con- 
trolling flag leaf area (Tan et al. , 2012). Combination of Hdl 
and Ehdl in rice can reduce the number of primary branches 
in a panicle, resulting in smaller spikelet numbers per pani- 
cle (Endo-Higashi and Izawa, 2011). Pleiotropic effects of 
flowering-related genes have also been reported in other 
plants. Arabidopsis LHP1 controls developmental pathways 
involved in organ and cell size (Gaudin et al, 2001). The 
potato tuberization transitions are controlled by two differ- 
ent FT-like paralogues (StSP3D and StSP6A) that respond 
to independent environmental cues (Navarro et al, 2011). 
Another observation is that flowering-related genes are not 
only expressed in the panicle, but also in the leaf, sheath, 
stem, hull, shoot, seedling, and even roots (Supplementary 
Fig. S7 available at JXB online) (Wang et al. , 2010b). All this 
evidence suggests that these firmly regarded 'flowering genes' 
have more functions than just flowering induction. However, 
whether the compositions and architectures of the regula- 
tory networks in terms of the genes and pathways are the 
same in different processes remain to be determined. 

Gene expression is a dynamic process depending on devel- 
opment stages, and different tissues or cells, as well as genetic 
backgrounds and growth environments (Jiao et al, 2009; 
Wang et al, 2010b). Thus, the results and interpretations of 
the eQTL analysis are also valid in terms of the tissues from 
which the RNA samples were collected. Whether the regula- 
tory network and/or co-expression relationships between the 
genes identified in a single tissue study are tissue specific, or 
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are to some extent also valid in other tissues, remains to be 
investigated. 

It should also be noted that the outcome of the eQTL detec- 
tion is critically dependent on the genetic composition of the 
mapping population defined by the polymorphisms between 
the parents. For example, Komiya et al. (2009) confirmed that 
the OsMADS50-Ehdl-RFTl pathway is involved in floral 
activation under long-day conditions. However, no c^-eQTL 
was detected for either OsMADS50 or Ehdl, probably because 
of no sequence difference in the promoter regions of these 
genes between the parents of our population. Consequently, 
neither OsMADS50 and Ehdl was identified as a trans-eQTL 
for other e-traits, and hence were not suggested as regulators 
by the eQTL analysis. Clearly, the parental genotypes pose a 
limiting factor for the extent to which the components in the 
regulatory network can be discovered in the analysis. Multiple 
populations of diverse parents may be used to provide com- 
plementary information for the network construction. 

A further challenge is to identify the ra-acting element in 
each case that regulates the expression variation of the e-trait, 
or the causal agent of the czs-eQTL. The definition of cis- 
effect in eQTL analyses is somewhat arbitrary, varying from 
5Mb (Morley et al, 2004) to lOkb of the gene (Yvert et al, 
2003), or a 1.5 LOD-drop SI of the QTL (Keurentjes et al, 
2007). In principle, a cw-eQTL should be caused by polymor- 
phisms within a neighbouring range of the gene, including 
polymorphisms within the gene itself that alter the ra-act- 
ing elements or sites for mRNA stability, or polymorphisms 
between parents in the promoter region (Heap et al, 2009; 
Smith et al, 2011). The above definition does not exclude 
the possibility that the regulatory element may act in trans, 
as in the case of the AMN1 gene in Saccharomyces cerevisiae 
that was found to be a local but fra«s-acting through feed- 
back regulation due to a coding polymorphism in its tran- 
script (Ronald et al, 2005). In any case, it is essential to find 
sequence polymorphisms between the parents and associate 
such polymorphisms with the expression level variation of 
the gene for identifying the as-acting elements of the e-traits. 

The large number of regulators and the regulatory network 
identified in the study only provide a starting point for under- 
standing the system. Although we have illustrated that some 
are supported by results from previous studies, huge effort is 
needed experimentally to confirm any of the master regula- 
tors and the regulatory network. 

Supplementary data 

Supplementary data is available at JXB online. 

Supplementary Fig. SI. An example of a flag leaf har- 
vested at the heading stage. 

Supplementary Fig. S2. The distribution of e-traits and 
probe sets with locus support in the genome. 

Supplementary Fig. S3. Distribution of R 2 values for cis- 
eQTLs and trans-eQTLs, individually. 

Supplementary Fig. S4. LOD curve for expression varia- 
tion that was calculated with the full set of probe sets and 
modified probe sets of RFT1 and Hd3a on chromosome 6. 
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Supplementary Fig. S5. LOD curve for expression varia- 
tion, which was calculated with the full set of probe sets and 
modified probe sets of RFT1 and Hd3ct on chromosome 7. 

Supplementary Fig. S6. Gene expression levels for RFT1 
and Hd3a in Zhenshan 97 and Minghui 63 of three biological 
replicates. 

Supplementary Fig. S7. Heatmap of the global development 
stage of the flowering-related genes identified in our network. 

Supplementary Table SI. Primers used in qPCR analy- 
sis for the candidate targets expression in NIL(mh7) and 
Zhenshan 97. 

Supplementary Table S2. Numbers of observed and 
expected e-traits in 1619 bins. 

Supplementary Table S3. Details of 13 647 significant gene 
expression quantitative trait loci (eQTLs) (P>0.05), affecting 
the expression of 10 725 rice unique probe sets. 

Supplementary Table S4. Information on observed trans- 
eQTLs, expected trans-eQTLs, and e-traits with ris-QTLs in 
1619 bins. 

Supplementary Table S5. The significant master candidate 
regulators. 

Supplementary Table S6. The enrichment gene ontology 
(GO) terms for significantly associated targets for six master 
regulators. 

Supplementary Table S7. Information on 577 unique loci 
involved in flowering time and development. 

Supplementary Table S8. Cis and trans-eQTL identifica- 
tion of flowering-associated genes. 

Supplementary Table S9. Total number of unique correlation 
pairs: 7808 for 61 candidate regulators and 171 potential targets. 

Supplementary Table S10. Gene expression for RFT1 and 
Hd3a was recalculated with the full set of probe sets and 
modified probe sets with probes with less than five SNPs 
removed between RFT1 and Hd3a, 

Supplementary Table SI 1 . Expression correlation for regu- 
lator and corresponding targets. 

Supplementary Table S12. The statistics of 25 pQTLs iden- 
tified using the ultrahigh-density SNP bin map. 

Supplementary Table SI 3. The most significant candidate 
genes for each trait. 

Supplementary Table SI 4. Correlation between phenotypic 
traits and e-traits with cz's-eQTL in the SI of QTL controlling 
corresponding traits. 
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